A novel language model based on self-organized learning

نویسندگان

  • Taiyi Huang
  • Langzhou Chen
چکیده

Statistical language model is very important to speech recognition. To a system of special topic, domain dependent language model is much better than general model. There are two problems in traditional method to train topic dependent model: 1. The corpus of special topic is not as enough as general corpus. 2. An individual article always relates to more than one topics, traditional method has not considered this phenomena. This paper try to solve these two problems. We have present a new method to organize the corpus--the method based on fuzzy training subset. And the training of domain dependent models are based on these fuzzy subsets. At the same time, a self organized learning approach is introduced in training process to improve the models’ predicting ability. The self organized learning can improve the performance of models evidently. 1. INTRUDUCTION In speech recognition, statistical n-gram model has been successfully used to guild the search and score a path of word string[1]. But a general language model can not use the topic information of speech content efficiently. So the performance of general model will drop when it is used for a specific domain. Topic dependent language model is a effective way to get better performance in special domain. There are two ways to build the topic dependent language model. One is mixture models[2], in this structure, the language models of different topics are all interpolated together according to the mixing factors. It can be expressed as. p w w x p w w i i k k k i i ( | ) ( | ) − − = ∑ 1 1 (1) where xk is the mixing factor of topic k and p w w k i i ( | ) −1 is the language model of topic k. The other is single model structure[3]. In this structure, topic dependent model is interpolated with a general language model which can be expressed as: p w w x p w w x p w w i i g i i k i i ( | ) ( | ) ( ) ( | ) − − − = ∗ + − ∗ 1 1 1 1 (2) where x is weighting factor and p w w g i i ( | ) −1 a general language model. In order to introduce the topic dependent language models into speech recognition, there are two problems must be solved. Firstly, how to update the mixing factors. This problem can be solved by EM algorithm[4], which the mixing factors can be estimated as: x i M x i p w h x j p w h n n i n m n m

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

The Interplay between Young Learners' Sense of Self-Efficacy in Reading Comprehension and English Language Proficiency

This study intended to explore the interplay between young language learners' sense of self-efficacy regarding reading comprehension in their reading test performance associated with learning English among universities. To undertake the study, a purposive sampling method was adopted. A total of 60 freshmen undergraduate learners of English consented to participate in this study.  A self-efficac...

متن کامل

The Impact of Training EFL Learners in Self-Regulation of Reading on their EFL Literal and Critical Reading Comprehension: Implementing a Model

Self-regulation is the ability to regulate one’s thoughts and actions to attain goals. Accordingly, self-regulated learning (SRL) involves plans and behaviors to achieve learning goals. With this in mind, in this study we investigated whether training English as a Foreign Language (EFL) learners on the basis of a Self-regulated Learning (SRL) model improved their literal and critical reading co...

متن کامل

Self-Regulation, Goal Orientation, Tolerance of Ambiguity and Autonomy as Predictors of Iranian EFL learners’ Second Language Achievement: A Structural Equation Modeling Approach

The identification of the cognitive, affective, social and even physiological factors affecting second or foreign language learning routes and rate has for long been a challenging aspiration for second language researchers. However, a recent preoccupation of the researchers in this area has been the study of the combinatorial impacts of such factors on second or foreign language learning proces...

متن کامل

تأثیر آموزش درس زبان عمومی طبق سرفصل مصوب بر انگیزش- نگرش و خودکارآمدی یادگیری زبان انگلیسی در دانشجویان دانشگاه علوم پزشکی بیرجند

Background and Aim:  One of the goals of learning the English language is communication with others. But despite the fact that Iranian students have been learning languages for many years, few are successful in acquiring relative skills in the English language. The aim of the present study is to investigate the effects of syllabus-based English language teaching on students’ English language mo...

متن کامل

Exploring the Potential of a Mobile Messaging Application for Self-Initiated Language Learning

With the rapid expansion of deploying mobile instant messaging applications such as Telegram for the purpose of language learning, it is quite apparent that language research in this regard is lagging behind the trend. This study addressed the matter by exploring how language learners utilize a Telegram group for the purpose of language learning. In this regard, the activities of a Telegram lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999